Import¶

In [ ]:
import warnings
from sklearn.exceptions import UndefinedMetricWarning

warnings.filterwarnings("ignore", category=UndefinedMetricWarning)
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=FutureWarning)

import plotly
plotly.offline.init_notebook_mode()
In [ ]:
from file_py.run_log_parser import RunLogParser
from file_py.csv_preprocessing_scaler import CsvPreprocessingScaler

from file_py.plots import Plots

from file_py.utils import MarkdownHelper

from file_py.attack_log_unification import AttackLogUnification
from file_py.stat_severity import StatSeverity
from file_py.attack_pattern_analyzer import AttackPatternAnalyzer
from file_py.signatures_patterns import SignaturePatterns

from file_py.signature_stats_calculator import SignatureStatsCalculator
from file_py.sigma_rule_analysis import SigmaRuleAnalysis

from file_py.plots_single_attack import PlotsSingleAttack

from file_py.correlation_matrix_plots import CorrelationMatrixPlots

from file_py.preprocessing_train_test_split import PreprocessingTrainTestSplit
from file_py.initial_training import InitialTraining
from file_py.hyperparameter_tuning import HyperparameterTuning
from file_py.advanced_models import AdvancedModels
from file_py.deep_learning_model import DeepLearningModel
from file_py.model_evaluator import ModelEvaluator

CARICAMENTO FILE¶

Sostituire il percorso dei file attuali con il percorso dei file di interesse qui:

In [ ]:
# FILE CONTENENTE I LOG
df = CsvPreprocessingScaler.read_csv_file("file_csv/LogSplunkWF_03_07.csv")

# FILE CON LE DATE DI INIZIO E FINE DEGLI ATTACCHI
files = ['file_csv/attackLog_03_07.csv']

Preprocessing¶

In [ ]:
df_raw = CsvPreprocessingScaler.RawPreprocessing(df)
df_Le = CsvPreprocessingScaler.LEPreprocessing(df)
df_OH = CsvPreprocessingScaler.OhePreprocessing(df)
In [ ]:
df_std_LE = CsvPreprocessingScaler.stdScaler(CsvPreprocessingScaler.LEPreprocessing(df))
df_std_OH = CsvPreprocessingScaler.stdScaler(CsvPreprocessingScaler.OhePreprocessing(df))

Test¶

In [ ]:
attack_log_path = AttackLogUnification.attack_log_together(files,'file_csv/attackLog_03_07.csv')
In [ ]:
result_df_Le = RunLogParser.process_attacks(attack_log_path, CsvPreprocessingScaler.stdScaler(CsvPreprocessingScaler.LEPreprocessing(df)))
result_df_OH = RunLogParser.process_attacks(attack_log_path, CsvPreprocessingScaler.stdScaler(CsvPreprocessingScaler.OhePreprocessing(df)))
result_df_Raw = RunLogParser.process_attacks(attack_log_path, CsvPreprocessingScaler.RawPreprocessing(df))

Graphic Analysis of Attacks¶

In [ ]:
Plots.plot_cake_attack(result_df_Raw)
No description has been provided for this image
In [ ]:
Plots.plot_top_10_signatures(result_df_Raw)
Out[ ]:

Qui si può notare come generalmente le regole scattate più volte sono anche quelle che hanno effettivamente risposto a più attachi e che sono scattate a vuoto più volte.

In [ ]:
Plots.plot_precision_recall(result_df_Raw)

Il primo grafico mostra la precisione di ciascuna regola, cioè la proporzione di attivazioni corrette rispetto al totale delle sue attivazioni.
Una precisione più alta indica che la regola è più accurata nel rilevare veri attacchi.

Il secondo grafico mostra il recall, cioè la proporzione di attacchi reali rilevati dalla regola rispetto al totale degli attacchi reali.
Un recall più alto indica che la regola è più efficace nel rilevare tutti gli attacchi possibili.

In [ ]:
Plots.plot_distributions(result_df_Raw)
No description has been provided for this image
In [ ]:
Plots.plot_value_counts_per_unique(result_df_Raw)
No description has been provided for this image
In [ ]:
variables = MarkdownHelper.create_value_counts_variables(result_df_Raw)
MarkdownHelper.display_value_counts_text(variables)
Grazie a questo grafico invece possiamo giungere ad una serie di conclusioni.

Su 31 regole diverse:

  • quelle scattate in risposta ad ALMENO un attacco reale sono 22. Di queste:

    • 4 si sono attivate più volte per non-attacchi rispetto che per gli attacchi. (regole generiche)
    • 6 si sono attivate lo stesso numero di volte per attacchi e non-attacchi.
    • 12 si sono attivate più volte in risposta ad attacchi rispetto che a non-attacchi (regole specifiche).
  • quelle scattate senza rispondere mai ad attacchi sono 9.

    Si tratta di: ['load-of-dbghelp/dbgcore-dll-from-suspicious-process', 'proc-start-suspicious-wmiprvse-child-process', 'proc-start-wmiservice-child', 'proc-start-cobaltstrike-load-by-rundll32', 'proc-start-powershell-base64-encoded-invoke-keyword', 'proc-start-hacktool-mimikatz-execution', 'proc-start-suspicious-powershell-parameter-substring', 'proc-start-lolbas-compile', 'proc-start-suspicious-process-created-via-wmic.exe']

Analysis of Severity per Attacks¶

In [ ]:
event_df = RunLogParser.create_event_df(attack_log_path, result_df_Raw)

Creazione del df event_df con le nuove colonne severity_max, _min, _mean

Grafici¶

In [ ]:
StatSeverity.plot_stat_severity(event_df)
No description has been provided for this image

In questo grafico vediamo, per ciascun attacco presente nel dataset, quali sono le loro criticità massime, minime e medie.

In [ ]:
analyzer = AttackPatternAnalyzer(event_df)
In [ ]:
# SCEGLIERE UN VALORE PER LA SEVERITY DELLE REGOLE DA CONSIDERARE
severity_value=73

# SCEGLIERE IL NUMERO DI ATTACCHI DA CONSIDERARE PRIMA DELLE REGOLE AVENTI LA SEVERITY SCELTA
num_attacks=10
In [ ]:
analyzer.pattern_before_attack(num_attacks=num_attacks, severity_value=severity_value)
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

In questi grafici prendiamo in considerazione gli attacchi precedenti a tutti gli attacchi che hanno una certa criticità media e visualizziamo tutti i valori di "RuleAnnotation.mitre_attack.id", "signature", "EventType", "tag", "severity_id" corrispondenti.

Per scegliere quanti attacchi prima di quelli che ci interessano vogliamo considerare basta modificare la variabile "num_attacks" e assegnarle il numero che vogliamo,
mentre per scegliere il valore della criticità media che ci interessa si deve modificare la variabile "severity_value".
Verranno presi in considerazione tutti gli attacchi con criticità media compresa tra 2.5 prima e 2.5 dopo del valore assegnato a "severity_value".

Robustezza regole¶

In [ ]:
signature_stats = SignatureStatsCalculator.create_signature_stats(event_df, result_df_Raw)
signature_stats
Out[ ]:
signature Indice_Diff Media_Differenza_Severity_min Media_Differenza_Severity_mean Media_Differenza_Severity_max N_Max_Sev_Diff_15 N_Attacchi_Non_rilevati
0 net-connect-80-443-non-browser 0.026178 -4.807692 -0.692819 0.000000 0 0
1 net-connect-Windows-processes 0.002101 0.000000 -0.121006 0.000000 0 0
2 net-connect-suspicious-target-names 0.001212 0.000000 0.148516 0.000000 0 0
3 proc-start-suspicious-powershell-download-and-... 0.005523 0.000000 0.148516 0.000000 0 0
4 proc-start-powershell-download-and-execution-c... 0.007495 0.000000 0.175519 0.000000 0 4
5 proc-start-dumping-of-sensitive-hives-via-reg.exe 0.002637 0.000000 0.545410 0.000000 0 0
6 proc-start-dir-user-writeable 0.023308 -3.125000 -0.893017 0.000000 0 2
7 suspicious-unsigned-dbghelp/dbgcore-dll-loaded 0.045368 -1.041667 3.040575 3.125000 2 2
8 reg-key-create-service 0.002792 0.000000 -0.025202 0.000000 0 0
9 proc-start-malicious-powershell-commandlets-pr... 0.013459 0.000000 0.000000 0.000000 0 0
10 proc-start-potential-meterpreter/cobaltstrike-... 0.000770 0.000000 0.000000 0.000000 0 0
11 proc-start-abused-debug-privilege-by-arbitrary... 0.000562 0.000000 0.000000 0.000000 0 0
12 proc-start-potential-cobaltstrike-process-patt... 0.000370 0.000000 0.000000 0.000000 0 0
13 proc-start-suspicious-new-service-creation 0.000562 0.000000 0.000000 0.000000 0 0
14 proc-start-lolbas-compile 0.000000 0.000000 0.000000 0.000000 0 0
15 proc-start-lolbas-alternate-data-streams 0.000000 0.000000 0.000000 0.000000 0 2
16 proc-start-potential-winapi-calls-via-commandline 0.015191 0.000000 0.694444 1.041667 1 2
17 reg-value-write-cert-change 0.022747 0.000000 2.264957 2.884615 2 0
18 proc-start-hacktool-mimikatz-execution 0.000000 0.000000 0.000000 0.000000 0 0
19 proc-start-cobaltstrike-load-by-rundll32 0.000000 0.000000 0.000000 0.000000 0 0
20 proc-start-suspicious-wmiprvse-child-process 0.000000 0.000000 0.000000 0.000000 0 0
21 proc-start-wmiservice-child 0.000000 0.000000 0.000000 0.000000 0 0
22 proc-start-suspicious-process-created-via-wmic... 0.000000 0.000000 0.000000 0.000000 0 0
23 proc-start-powershell-base64-encoded-invoke-ke... 0.000000 0.000000 0.000000 0.000000 0 0
24 proc-start-suspicious-powershell-parameter-sub... 0.000000 0.000000 0.000000 0.000000 0 0
25 proc-start-rundll32-execution-without-dll-file 0.001923 0.000000 0.000000 0.000000 0 0
26 proc-start-suspicious-key-manager-access 0.001923 0.000000 0.000000 0.000000 0 0
27 proc-start-potentially-suspicious-powershell-c... 0.001923 0.000000 0.000000 0.000000 0 0
28 load-of-dbghelp/dbgcore-dll-from-suspicious-pr... 0.000000 0.000000 0.000000 0.000000 0 0
29 proc-start-process-memory-dump-via-comsvcs.dll 0.004006 0.000000 0.000000 0.000000 0 0
30 proc-start-potential-credential-dumping-attemp... 0.002564 0.000000 0.000000 0.000000 0 0

signature_stats è un dataset in cui possiamo vedere per ogni regola se venisse rimossa quali cambiamenti di severity apporterebbe al dataset dei log e se ci dovessero essere degli attacchi che non vengono rilevati.

In [ ]:
analysis = SigmaRuleAnalysis(signature_stats)
analysis.plots_sigma_rule_analysis()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Graphic Analysis of Attacks for Chosen Rule¶

In [ ]:
# SCEGLIERE LA REGOLA CHE SI VUOLE ANALIZZARE
regola_scelta = 'suspicious-unsigned-dbghelp/dbgcore-dll-loaded'
In [ ]:
PlotsSingleAttack.analyze_rule_activations(result_df_Raw, regola_scelta)
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

In questi grafici in base alla regola che si vuole analizzare possiamo visualizzare:

  • la frequenza delle attivazioni delle regole (attacchi e non attacchi) suddivise in intervalli di 5 minuti;
  • gli attacchi e i non-attacchi in base a:
    • RuleAnnotation.mitre_attack.id
    • EventType,
    • severity,
    • tag
    • parent_process_id,
    • process_id
In [ ]:
# SCEGLIERE IL NUMERO DI EVENTI PRECEDENTI ALLA REGOLA CHE SI VOGLIONO ANALIZZARE
eventi_da_considerare = 5
In [ ]:
PlotsSingleAttack.patterns_before_activation(result_df_Raw, regola_scelta, eventi_da_considerare)
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

In questi grafici vediamo quali sono rispettivamente le regole, gli attacchi, gli EventType, i tag, i parent_process, i process e le severity degli eventi subito prima delle prime attivazioni della regola scelta.
Il numero di eventi da considerare lo scegliamo affidando alla variabile elementi_da_considerare il numero che vogliamo.

Con "prima attivazione di una regola" si intende quando almeno un elemento delle colonne signature, RuleAnnotation.mitre_attack.id, EventType, tag, severity_id, parent_process_id o process_id (non sono considerate solo le colonne _time e corrisponde_ad_attacco) di un evento differisce da quello precedente.

Patterns¶

In [ ]:
signature_patterns = SignaturePatterns.recognize_signatures_patterns(result_df_Raw)
signature_patterns
Pattern: ('proc-start-malicious-powershell-commandlets-processcreation', 'proc-start-malicious-powershell-commandlets-processcreation'), Frequenza: 252
Pattern: ('proc-start-malicious-powershell-commandlets-processcreation', 'proc-start-malicious-powershell-commandlets-processcreation', 'proc-start-malicious-powershell-commandlets-processcreation'), Frequenza: 248
Pattern: ('proc-start-malicious-powershell-commandlets-processcreation', 'proc-start-malicious-powershell-commandlets-processcreation', 'proc-start-malicious-powershell-commandlets-processcreation', 'proc-start-malicious-powershell-commandlets-processcreation'), Frequenza: 244
Pattern: ('proc-start-dumping-of-sensitive-hives-via-reg.exe', 'proc-start-dumping-of-sensitive-hives-via-reg.exe'), Frequenza: 26
Pattern: ('proc-start-dumping-of-sensitive-hives-via-reg.exe', 'proc-start-dumping-of-sensitive-hives-via-reg.exe', 'proc-start-dumping-of-sensitive-hives-via-reg.exe'), Frequenza: 25
Pattern: ('proc-start-dumping-of-sensitive-hives-via-reg.exe', 'proc-start-dumping-of-sensitive-hives-via-reg.exe', 'proc-start-dumping-of-sensitive-hives-via-reg.exe', 'proc-start-dumping-of-sensitive-hives-via-reg.exe'), Frequenza: 24
Pattern: ('reg-value-write-cert-change', 'reg-value-write-cert-change'), Frequenza: 9
Pattern: ('reg-value-write-cert-change', 'reg-value-write-cert-change', 'reg-value-write-cert-change'), Frequenza: 7
Pattern: ('proc-start-suspicious-powershell-download-and-execute-pattern', 'proc-start-powershell-download-and-execution-cradles'), Frequenza: 6
Pattern: ('reg-value-write-cert-change', 'reg-value-write-cert-change', 'reg-value-write-cert-change', 'reg-value-write-cert-change'), Frequenza: 5
Pattern: ('proc-start-powershell-download-and-execution-cradles', 'proc-start-malicious-powershell-commandlets-processcreation'), Frequenza: 4
Pattern: ('proc-start-suspicious-powershell-download-and-execute-pattern', 'proc-start-powershell-download-and-execution-cradles', 'proc-start-malicious-powershell-commandlets-processcreation'), Frequenza: 4
Pattern: ('proc-start-powershell-download-and-execution-cradles', 'proc-start-malicious-powershell-commandlets-processcreation', 'proc-start-malicious-powershell-commandlets-processcreation'), Frequenza: 4
Pattern: ('proc-start-suspicious-powershell-download-and-execute-pattern', 'proc-start-powershell-download-and-execution-cradles', 'proc-start-malicious-powershell-commandlets-processcreation', 'proc-start-malicious-powershell-commandlets-processcreation'), Frequenza: 4
Pattern: ('proc-start-powershell-download-and-execution-cradles', 'proc-start-malicious-powershell-commandlets-processcreation', 'proc-start-malicious-powershell-commandlets-processcreation', 'proc-start-malicious-powershell-commandlets-processcreation'), Frequenza: 4
Pattern: ('proc-start-dir-user-writeable', 'proc-start-dir-user-writeable'), Frequenza: 3
Pattern: ('proc-start-potential-meterpreter/cobaltstrike-activity', 'proc-start-potential-meterpreter/cobaltstrike-activity'), Frequenza: 3

In signature_patterns vediamo le sequenze di 3, 4 o 5 regole in ordine da quella più a quella meno frequente ripetutesi più volte durante i vari attacchi e che non compaiono mai tra le sequenze di falsi attacchi

With specified severity value¶

In [ ]:
result_pattern_inside_attack = analyzer.pattern_inside_attack(severity_value=severity_value)
result_pattern_inside_attack
MITRE ATT&CK IDs:
1-digit repetitions:

2-digits sequences:
  ('T1036', 'T1036'): 1
  ('T1003.001', 'T1003.001'): 1

3-digits sequences:
  ('T1059.001', 'T1059', 'T1482'): 2
  ('T1003.001', 'T1003.001', 'T1003.001'): 2
  ('T1218.011', 'T1555.004', 'T1059.001'): 1
  ('T1003.001', 'T1003.001', 'T1106'): 1
  ('T1134.001', 'T1134.001', 'T1134.002'): 1
  ('T1003.002', 'T1003.002', 'T1003.002'): 1

SIGNATURES:
1-digit repetitions:

2-digits sequences:
  ('proc-start-process-memory-dump-via-comsvcs.dll', 'proc-start-process-memory-dump-via-comsvcs.dll'): 1
  ('suspicious-unsigned-dbghelp/dbgcore-dll-loaded', 'suspicious-unsigned-dbghelp/dbgcore-dll-loaded'): 1

3-digits sequences:
  ('proc-start-suspicious-powershell-download-and-execute-pattern', 'proc-start-powershell-download-and-execution-cradles', 'proc-start-malicious-powershell-commandlets-processcreation'): 2
  ('suspicious-unsigned-dbghelp/dbgcore-dll-loaded', 'suspicious-unsigned-dbghelp/dbgcore-dll-loaded', 'suspicious-unsigned-dbghelp/dbgcore-dll-loaded'): 2
  ('proc-start-rundll32-execution-without-dll-file', 'proc-start-suspicious-key-manager-access', 'proc-start-potentially-suspicious-powershell-child-processes'): 1
  ('suspicious-unsigned-dbghelp/dbgcore-dll-loaded', 'suspicious-unsigned-dbghelp/dbgcore-dll-loaded', 'proc-start-potential-winapi-calls-via-commandline'): 1
  ('proc-start-potential-meterpreter/cobaltstrike-activity', 'proc-start-potential-meterpreter/cobaltstrike-activity', 'proc-start-potential-meterpreter/cobaltstrike-activity'): 1
  ('proc-start-dumping-of-sensitive-hives-via-reg.exe', 'proc-start-dumping-of-sensitive-hives-via-reg.exe', 'proc-start-dumping-of-sensitive-hives-via-reg.exe'): 1

In result_pattern_inside_attack vediamo:

  • '1-digit repetitions' che corrisponde alle ripetizioni di mitre_attack.id e signature in testa agli attacchi con un numero massimo di 3 mitre o signature registrati;
  • '2-digits sequences' che corrisponde alle sequenze di mitre_attack.id e signature in testa agli attacchi con un numero di mitre o signature compreso tra 4 e 5;
  • '3-digits sequences' che corrisponde alle sequenze di mitre_attack.id e signature in testa agli attacchi con un numero di mitre o signature maggiore di 5 (non compreso).

Correlation Matrix¶

In [ ]:
CorrelationMatrixPlots.plot_correlation_matrix(result_df_Le, 'Correlation Matrix (Label Encoding)')
CorrelationMatrixPlots.plot_correlation_matrix_big(result_df_OH, 'Correlation Matrix (OneHot Encoding)')
No description has been provided for this image
No description has been provided for this image

ML¶

OneHot¶

In [ ]:
# Split data
X_train_OH, X_test_OH, y_train_OH, y_test_OH = PreprocessingTrainTestSplit.split_data(result_df_OH, "corrisponde_ad_attacco")

# Initial model training and evaluation
InitialTraining.train_and_evaluate_initial_models(X_train_OH, y_train_OH, X_test_OH, y_test_OH)

# Hyperparameter tuning
best_models_OH = HyperparameterTuning.tune_hyperparameters(X_train_OH, y_train_OH)

# Evaluate best models on test set
evaluator_OH = ModelEvaluator(best_models_OH)
evaluation_results_OH = evaluator_OH.evaluate_models(X_test_OH, y_test_OH)

# Train XGBoost model
AdvancedModels.train_xgboost(X_train_OH, y_train_OH, X_test_OH, y_test_OH)

# Train deep learning model
DeepLearningModel.train_deep_learning_model(X_train_OH, y_train_OH, X_test_OH, y_test_OH)
Decision Tree Classification Report:
              precision    recall  f1-score   support

           0       0.84      0.67      0.74        39
           1       0.89      0.95      0.92       111

    accuracy                           0.88       150
   macro avg       0.86      0.81      0.83       150
weighted avg       0.88      0.88      0.88       150


AdaBoost Classification Report:
              precision    recall  f1-score   support

           0       0.79      0.59      0.68        39
           1       0.87      0.95      0.91       111

    accuracy                           0.85       150
   macro avg       0.83      0.77      0.79       150
weighted avg       0.85      0.85      0.85       150


XGBoost Classification Report:
              precision    recall  f1-score   support

           0       0.85      0.72      0.78        39
           1       0.91      0.95      0.93       111

    accuracy                           0.89       150
   macro avg       0.88      0.84      0.85       150
weighted avg       0.89      0.89      0.89       150


CatBoost Classification Report:
              precision    recall  f1-score   support

           0       0.87      0.67      0.75        39
           1       0.89      0.96      0.93       111

    accuracy                           0.89       150
   macro avg       0.88      0.82      0.84       150
weighted avg       0.89      0.89      0.88       150


MLP Classification Report:
              precision    recall  f1-score   support

           0       0.26      1.00      0.41        39
           1       0.00      0.00      0.00       111

    accuracy                           0.26       150
   macro avg       0.13      0.50      0.21       150
weighted avg       0.07      0.26      0.11       150


Quadratic Discriminant Analysis Classification Report:
              precision    recall  f1-score   support

           0       0.41      0.92      0.57        39
           1       0.95      0.53      0.68       111

    accuracy                           0.63       150
   macro avg       0.68      0.73      0.62       150
weighted avg       0.81      0.63      0.65       150


Extra Trees Classification Report:
              precision    recall  f1-score   support

           0       0.84      0.67      0.74        39
           1       0.89      0.95      0.92       111

    accuracy                           0.88       150
   macro avg       0.86      0.81      0.83       150
weighted avg       0.88      0.88      0.88       150

Best parameters for Random Forest: {'max_depth': None, 'min_samples_split': 2, 'n_estimators': 100}
Best F1-score: 0.9421461328806625
Best parameters for Gradient Boosting: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 100}
Best F1-score: 0.9409593851756958
Best parameters for Naive Bayes: {}
Best F1-score: 0.2488116446766236
Best parameters for KNN: {'knn__metric': 'manhattan', 'knn__n_neighbors': 7, 'knn__weights': 'distance'}
Best F1-score: 0.9712771479423694
Best parameters for Logistic Regression: {'logreg__C': 1, 'logreg__solver': 'lbfgs'}
Best F1-score: 0.9084881053802359

Random Forest Classification Report:
              precision    recall  f1-score   support

           0       0.84      0.67      0.74        39
           1       0.89      0.95      0.92       111

    accuracy                           0.88       150
   macro avg       0.86      0.81      0.83       150
weighted avg       0.88      0.88      0.88       150


Gradient Boosting Classification Report:
              precision    recall  f1-score   support

           0       0.84      0.67      0.74        39
           1       0.89      0.95      0.92       111

    accuracy                           0.88       150
   macro avg       0.86      0.81      0.83       150
weighted avg       0.88      0.88      0.88       150


Naive Bayes Classification Report:
              precision    recall  f1-score   support

           0       0.30      0.92      0.45        39
           1       0.90      0.24      0.38       111

    accuracy                           0.42       150
   macro avg       0.60      0.58      0.42       150
weighted avg       0.74      0.42      0.40       150


KNN Classification Report:
              precision    recall  f1-score   support

           0       0.87      0.87      0.87        39
           1       0.95      0.95      0.95       111

    accuracy                           0.93       150
   macro avg       0.91      0.91      0.91       150
weighted avg       0.93      0.93      0.93       150


Logistic Regression Classification Report:
              precision    recall  f1-score   support

           0       0.77      0.69      0.73        39
           1       0.90      0.93      0.91       111

    accuracy                           0.87       150
   macro avg       0.83      0.81      0.82       150
weighted avg       0.86      0.87      0.86       150

[0]	train-auc:0.82316	eval-auc:0.82121
[1]	train-auc:0.82316	eval-auc:0.82121
[2]	train-auc:0.82466	eval-auc:0.82144
[3]	train-auc:0.83513	eval-auc:0.82040
[4]	train-auc:0.83513	eval-auc:0.82040
[5]	train-auc:0.84525	eval-auc:0.86232
[6]	train-auc:0.84442	eval-auc:0.85285
[7]	train-auc:0.84344	eval-auc:0.85054
[8]	train-auc:0.84471	eval-auc:0.85331
[9]	train-auc:0.84471	eval-auc:0.85331
[10]	train-auc:0.84934	eval-auc:0.86498
[11]	train-auc:0.84951	eval-auc:0.85505
[12]	train-auc:0.84995	eval-auc:0.85643
[13]	train-auc:0.84999	eval-auc:0.85643
[14]	train-auc:0.84999	eval-auc:0.85643
[15]	train-auc:0.85039	eval-auc:0.87052
[16]	train-auc:0.85077	eval-auc:0.87168
[17]	train-auc:0.85314	eval-auc:0.87156
[18]	train-auc:0.85306	eval-auc:0.87156
[19]	train-auc:0.85306	eval-auc:0.87156
[20]	train-auc:0.85306	eval-auc:0.87156
[21]	train-auc:0.85162	eval-auc:0.86394
[22]	train-auc:0.85400	eval-auc:0.85840
[23]	train-auc:0.85060	eval-auc:0.85944
[24]	train-auc:0.85049	eval-auc:0.85944
[25]	train-auc:0.85580	eval-auc:0.85840
Accuracy: 85.33%
ROC AUC: 0.86
              precision    recall  f1-score   support

           0       0.77      0.62      0.69        39
           1       0.87      0.94      0.90       111

    accuracy                           0.85       150
   macro avg       0.82      0.78      0.80       150
weighted avg       0.85      0.85      0.85       150

Epoch 1/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7420 - loss: 43462084.0000
Epoch 2/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.7238 - loss: 2059577.3750
Epoch 3/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.4966 - loss: 1675796.1250
Epoch 4/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.6978 - loss: 3899460.2500 
Epoch 5/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.6036 - loss: 712032.1875 
Epoch 6/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.6346 - loss: 1453109.6250 
Epoch 7/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.7205 - loss: 2453228.2500 
Epoch 8/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.5295 - loss: 637491.3125 
Epoch 9/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.6576 - loss: 2188584.0000 
Epoch 10/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.6201 - loss: 3108840.0000
Test Accuracy: 0.7400000095367432
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step
Classification Report for Deep Learning Model:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00        39
           1       0.74      1.00      0.85       111

    accuracy                           0.74       150
   macro avg       0.37      0.50      0.43       150
weighted avg       0.55      0.74      0.63       150

Out[ ]:
<Sequential name=sequential, built=True>

Label¶

In [ ]:
# Split data
X_train_Le, X_test_Le, y_train_Le, y_test_Le = PreprocessingTrainTestSplit.split_data(result_df_Le, "corrisponde_ad_attacco")

# Initial model training and evaluation
InitialTraining.train_and_evaluate_initial_models(X_train_Le, y_train_Le, X_test_Le, y_test_Le)

# Hyperparameter tuning
best_models_Le = HyperparameterTuning.tune_hyperparameters(X_train_Le, y_train_Le)

# Evaluate best models on test set
evaluator_Le = ModelEvaluator(best_models_Le)
evaluation_results_Le = evaluator_Le.evaluate_models(X_test_Le, y_test_Le)

# Train XGBoost model
AdvancedModels.train_xgboost(X_train_Le, y_train_Le, X_test_Le, y_test_Le)

# Train deep learning model
DeepLearningModel.train_deep_learning_model(X_train_Le, y_train_Le, X_test_Le, y_test_Le)
Decision Tree Classification Report:
              precision    recall  f1-score   support

           0       0.86      0.79      0.83        39
           1       0.93      0.95      0.94       111

    accuracy                           0.91       150
   macro avg       0.90      0.87      0.88       150
weighted avg       0.91      0.91      0.91       150


AdaBoost Classification Report:
              precision    recall  f1-score   support

           0       0.84      0.54      0.66        39
           1       0.86      0.96      0.91       111

    accuracy                           0.85       150
   macro avg       0.85      0.75      0.78       150
weighted avg       0.85      0.85      0.84       150


XGBoost Classification Report:
              precision    recall  f1-score   support

           0       0.84      0.69      0.76        39
           1       0.90      0.95      0.93       111

    accuracy                           0.89       150
   macro avg       0.87      0.82      0.84       150
weighted avg       0.88      0.89      0.88       150


CatBoost Classification Report:
              precision    recall  f1-score   support

           0       0.87      0.69      0.77        39
           1       0.90      0.96      0.93       111

    accuracy                           0.89       150
   macro avg       0.89      0.83      0.85       150
weighted avg       0.89      0.89      0.89       150


MLP Classification Report:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00        39
           1       0.74      1.00      0.85       111

    accuracy                           0.74       150
   macro avg       0.37      0.50      0.43       150
weighted avg       0.55      0.74      0.63       150


Quadratic Discriminant Analysis Classification Report:
              precision    recall  f1-score   support

           0       0.64      0.69      0.67        39
           1       0.89      0.86      0.88       111

    accuracy                           0.82       150
   macro avg       0.77      0.78      0.77       150
weighted avg       0.82      0.82      0.82       150


Extra Trees Classification Report:
              precision    recall  f1-score   support

           0       0.84      0.69      0.76        39
           1       0.90      0.95      0.93       111

    accuracy                           0.89       150
   macro avg       0.87      0.82      0.84       150
weighted avg       0.88      0.89      0.88       150

Best parameters for Random Forest: {'max_depth': 20, 'min_samples_split': 2, 'n_estimators': 100}
Best F1-score: 0.9480167025010606
Best parameters for Gradient Boosting: {'learning_rate': 0.3, 'max_depth': 3, 'n_estimators': 100}
Best F1-score: 0.9394801191834763
Best parameters for Naive Bayes: {}
Best F1-score: 0.7503471276198549
Best parameters for KNN: {'knn__metric': 'euclidean', 'knn__n_neighbors': 5, 'knn__weights': 'distance'}
Best F1-score: 0.9781855877828152
Best parameters for Logistic Regression: {'logreg__C': 0.1, 'logreg__solver': 'lbfgs'}
Best F1-score: 0.9124706106218712

Random Forest Classification Report:
              precision    recall  f1-score   support

           0       0.84      0.69      0.76        39
           1       0.90      0.95      0.93       111

    accuracy                           0.89       150
   macro avg       0.87      0.82      0.84       150
weighted avg       0.88      0.89      0.88       150


Gradient Boosting Classification Report:
              precision    recall  f1-score   support

           0       0.88      0.77      0.82        39
           1       0.92      0.96      0.94       111

    accuracy                           0.91       150
   macro avg       0.90      0.87      0.88       150
weighted avg       0.91      0.91      0.91       150


Naive Bayes Classification Report:
              precision    recall  f1-score   support

           0       0.44      0.72      0.54        39
           1       0.87      0.68      0.76       111

    accuracy                           0.69       150
   macro avg       0.65      0.70      0.65       150
weighted avg       0.76      0.69      0.70       150


KNN Classification Report:
              precision    recall  f1-score   support

           0       0.95      0.90      0.92        39
           1       0.96      0.98      0.97       111

    accuracy                           0.96       150
   macro avg       0.96      0.94      0.95       150
weighted avg       0.96      0.96      0.96       150


Logistic Regression Classification Report:
              precision    recall  f1-score   support

           0       0.80      0.51      0.62        39
           1       0.85      0.95      0.90       111

    accuracy                           0.84       150
   macro avg       0.82      0.73      0.76       150
weighted avg       0.84      0.84      0.83       150

[0]	train-auc:0.82353	eval-auc:0.82190
[1]	train-auc:0.87323	eval-auc:0.79603
[2]	train-auc:0.87468	eval-auc:0.79441
[3]	train-auc:0.86802	eval-auc:0.78286
[4]	train-auc:0.88090	eval-auc:0.82121
[5]	train-auc:0.88220	eval-auc:0.84211
[6]	train-auc:0.88166	eval-auc:0.84176
[7]	train-auc:0.88310	eval-auc:0.85112
[8]	train-auc:0.88271	eval-auc:0.84442
[9]	train-auc:0.88364	eval-auc:0.85262
[10]	train-auc:0.88163	eval-auc:0.85043
[11]	train-auc:0.87650	eval-auc:0.84142
[12]	train-auc:0.88711	eval-auc:0.85690
[13]	train-auc:0.88780	eval-auc:0.85690
[14]	train-auc:0.88791	eval-auc:0.85690
[15]	train-auc:0.88821	eval-auc:0.86175
[16]	train-auc:0.88821	eval-auc:0.86175
[17]	train-auc:0.88807	eval-auc:0.86175
[18]	train-auc:0.88810	eval-auc:0.85759
[19]	train-auc:0.88900	eval-auc:0.86221
[20]	train-auc:0.88889	eval-auc:0.86198
[21]	train-auc:0.88903	eval-auc:0.86198
[22]	train-auc:0.88952	eval-auc:0.85874
[23]	train-auc:0.88908	eval-auc:0.86221
[24]	train-auc:0.88873	eval-auc:0.86221
[25]	train-auc:0.88874	eval-auc:0.86175
[26]	train-auc:0.89051	eval-auc:0.85251
[27]	train-auc:0.89013	eval-auc:0.85228
[28]	train-auc:0.88966	eval-auc:0.85505
Accuracy: 84.00%
ROC AUC: 0.85
              precision    recall  f1-score   support

           0       0.69      0.69      0.69        39
           1       0.89      0.89      0.89       111

    accuracy                           0.84       150
   macro avg       0.79      0.79      0.79       150
weighted avg       0.84      0.84      0.84       150

Epoch 1/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.7210 - loss: 35000396.0000   
Epoch 2/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.6672 - loss: 5387963.0000 
Epoch 3/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.5509 - loss: 1828111.8750 
Epoch 4/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.6714 - loss: 2025396.5000 
Epoch 5/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.6830 - loss: 3374135.5000 
Epoch 6/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.6411 - loss: 1886459.8750 
Epoch 7/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.6435 - loss: 2256157.0000 
Epoch 8/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.5381 - loss: 1076551.0000 
Epoch 9/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.5985 - loss: 1026674.2500 
Epoch 10/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.7046 - loss: 2105922.5000 
Test Accuracy: 0.7400000095367432
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step 
Classification Report for Deep Learning Model:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00        39
           1       0.74      1.00      0.85       111

    accuracy                           0.74       150
   macro avg       0.37      0.50      0.43       150
weighted avg       0.55      0.74      0.63       150

Out[ ]:
<Sequential name=sequential_1, built=True>
In [ ]:
evaluator_OH.print_best_model('OneHot Encoder')
evaluator_Le.print_best_model('Label Encoder')
Dopo la codifica con OneHot Encoder il modello migliore è stato KNN con lo score di 0.9134

Dopo la codifica con Label Encoder il modello migliore è stato KNN con lo score di 0.9471